kleidiai: fix GGML_ASSERT(*cur_backend_id != -1) failed #15614

chaxu01 · 2025-08-27T10:18:06Z

This patch fixes the assert issue for the latest whisper.cpp when KleidiAI is enabled: ggml/src/ggml-backend.cpp:1088: GGML_ASSERT(*cur_backend_id != -1) failed.

ggerganov · 2025-08-28T07:38:52Z

ggml/src/ggml-cpu/kleidiai/kleidiai.cpp

+static inline bool is_whisper_model(const struct ggml_tensor* op) {
+    const int64_t n_dims = op->src[0]->ne[0];
+    if (n_dims == 384 || n_dims == 512 || n_dims == 768 ||
+        n_dims == 1024 || n_dims == 1280) {
+        return true;
+    }
+    return false;
+}
+


This seems very hacky - what is the reason to add this check?

This check is added to ensure KleidiAI backend only handles GET_ROWS operations for Whisper models, not for LLaMA models becaue of the whisper's specific usage pattern (GET_ROWS -> MUL_MAT in decoder which can be accelerated by the kleidiai kernels). Those dimensions are specific to Whisper models used across different Whisper model sizes.
- 384: Whisper tiny
- 512: Whisper base
- 768: Whisper small
- 1024: Whisper medium
- 1280: Whisper large

I'm still not sure why we need to do the check. What is a failure case that you observe? For example if we have a non-Whisper LLM that calls ggml_get_rows with n_dims = 256 why would we not want to use this path?

The failure case we observed occurs in whisper.cpp after PR #15118 was merged. Specifically, a GGML_ASSERT is triggered when the newly added GET_ROWS support check fails for the KleidiAI backend.

The main reason for enabling GET_ROWS support for Whisper in KleidiAI is that Whisper reuses the same weight tensor for both GET_ROWS and MUL_MAT in the decoder. If GET_ROWS isn't supported, it falls back to CPU — which in turn forces the shared tensor to fall back as well, disabling KleidiAI acceleration for MUL_MAT.

In contrast, LLaMA and similar models don’t reuse weights across these ops, so enabling GET_ROWS support for them would introduce unnecessary repacking without any performance gain.

The n_dims check is a lightweight way to detect Whisper models based on known decoder embedding sizes (384–1280). Happy to revisit or generalize this approach if other models benefit from the same pattern in the future.

@ggerganov, just wanted to follow up on this one.

The main goal here is to ensure Whisper models benefit from KleidiAI acceleration by keeping GET_ROWS + MUL_MAT on the backend, while avoiding unnecessary repacking for LLaMA and similar models. The n_dims check was meant as a lightweight heuristic for Whisper’s known decoder embedding sizes. Happy to adjust the approach if you prefer a different way of handling Whisper-specific GET_ROWS acceleration.

I see, understand what is the root issue now - same tensor being used both for GET_ROWS and MUL_MAT. I don't think this is the right solution. We should not make such assumptions about the tensor shapes in the underlying implementation as it is very fragile.

In contrast, LLaMA and similar models don’t reuse weights across these ops, so enabling GET_ROWS support for them would introduce unnecessary repacking without any performance gain.

Do you have an estimate of how large the negative impact is in such cases?

In hindsight the logic for repacking tensors used with GET_ROWS was probably premature. The repacking mechanism was designed with a single op in mind. I.e. if the single op that uses the tensor would benefit from repacking then we allow repacking.

In contrast, for this use case, the condition to repack now depends also on the next ops in the graph. This is something we overlooked when implementing the GET_ROWS repacking support and probably have to take a step back on it and think of a better way to support these use cases.

I agree that the shape-based check is fragile and not the right direction long term.

I have done some benchmarks. Since the repacking process happens during model load, I wasn’t able to observe any negative impact on an M4Pro when enabling GET_ROWS for LLaMA and similar models. Even the model load time didn’t show any noticeable change in llama.cpp.

Given that, I think it’s better not to special-case whisper.cpp at all for now. I’ll update the patch accordingly unless you have other thoughts, and I’ll be happy to revisit this once the repack support is updated to better handle cases like Whisper’s shared tensor pattern.

./bin/llama-bench -m ./models/llama-2-7b-q4_0.gguf -ngl 0 -t 4 W/O GET_ROWS | model | size | params | backend | threads | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: | | llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 4 | pp512 | 85.01 ± 0.52 | | llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 4 | tg128 | 40.08 ± 0.92 | build: d93dcc27 (6287) W/ GET_ROWS | model | size | params | backend | threads | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: | | llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 4 | pp512 | 85.60 ± 0.25 | | llama 7B Q4_0 | 3.56 GiB | 6.74 B | CPU | 4 | tg128 | 42.49 ± 0.63 | build: d93dcc27 (6287)

Given that, I think it’s better not to special-case whisper.cpp at all for now. I’ll update the patch accordingly unless you have other thoughts, and I’ll be happy to revisit this once the repack support is updated to better handle cases like Whisper’s shared tensor pattern.

Ok, lets avoid the special-cases for now since the impact does not seem to be significant. If we don't find cases in which the impact is measurable, then I think it would be OK to keep the GET_ROWS repacking as it is.

kleidiai: fix GGML_ASSERT(*cur_backend_id != -1) failed

6bc9445

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Aug 27, 2025

chaxu01 requested a review from ggerganov August 28, 2025 07:26

ggerganov reviewed Aug 28, 2025

View reviewed changes

removes the Whisper-specific check for GET_ROWS support

cd85c77

ggerganov approved these changes Sep 9, 2025

View reviewed changes

chaxu01 merged commit 2b3efea into ggml-org:master Sep 11, 2025
48 checks passed

chaxu01 deleted the feature/get-rows-assert branch October 7, 2025 14:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kleidiai: fix GGML_ASSERT(*cur_backend_id != -1) failed #15614

kleidiai: fix GGML_ASSERT(*cur_backend_id != -1) failed #15614

Uh oh!

chaxu01 commented Aug 27, 2025

Uh oh!

ggerganov Aug 28, 2025

Uh oh!

chaxu01 Aug 28, 2025 •

edited

Loading

Uh oh!

ggerganov Aug 29, 2025

Uh oh!

chaxu01 Sep 1, 2025

Uh oh!

chaxu01 Sep 8, 2025

Uh oh!

ggerganov Sep 8, 2025

Uh oh!

chaxu01 Sep 8, 2025

Uh oh!

ggerganov Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kleidiai: fix GGML_ASSERT(*cur_backend_id != -1) failed #15614

kleidiai: fix GGML_ASSERT(*cur_backend_id != -1) failed #15614

Uh oh!

Conversation

chaxu01 commented Aug 27, 2025

Uh oh!

ggerganov Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

chaxu01 Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

chaxu01 Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

chaxu01 Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

chaxu01 Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chaxu01 Aug 28, 2025 •

edited

Loading